Land Use, Transit Flows and Demographic PatternsΒΆ
Nutvara J, Kaitlyn Ng, Kittibhum Tasanasuwan
1. BackgroundΒΆ
Transportation policy-makers often face the challenge of choosing between various transportation scenarios. To do this, they need to understand the flow of origin and destination (OD) of people. King County has diverse land use, including urban areas like Seattle, small towns like Issaquah, and rural areas like Snoqualmie. With different land uses, the population demographics also vary. Therefore, understanding how demographics and land use affect travel patterns is crucial for urban planning and transportation infrastructure development.
This study aims to analyze the correlations between travel patterns, demographics, and land use. The study seeks to answer the following questions:
- What is the correlation between OD flows and demographics given similar land uses?
- How do land use and demographics affect the number of trips?
- Does the city provide a reasonable amount of transit to each census tract based on demographics and land use types?
2. DataΒΆ
Our data came from three sources for King County in 2019.
- Travel flow data from the Puget Sound Regional Council Household Travel Survey
- Variables extracted: number of transit trips originating and ending in each census tract in King County
- Workflow: created origin-destination matrix for census tracts in King County (sparsity issue), filtered data to only include transit trips
- Land use data from General Land Use Final Dataset
- Variables extracted: largest land use of each census tract
- Workflow: clipped land use data by census tract geometries to determine the land use with the largest area proportion
5. Demographic data from *[American Community Survey](https://www.census.gov/programs-surveys/acs)*
- Variables extracted: various demographic data (race, income, immigration status, housing costs, mode to travel to work, etc.)
- Workflow: identified variables of interest, queried from Census API
Load and Merge DataΒΆ
3. Analysis & DiscussionΒΆ
3.1 Correlation matrix between demographic and OD flowΒΆ
Correlation coefficients are used to find how strong a relationship is between data. The formulas return a value between -1 and 1, where
- 1 indicates a strong positive relationship.
- -1 indicates a strong negative relationship.
- 0 indicates no relationship at all.
In this study, we want to observe the relationship between demographics and OD flows categorized by different land uses.
Data PreparationsΒΆ
Tract object tractid int64 origin_flow float64 dest_flow float64 Name object Density float64 % Race:WhiteAlone float64 % Race:BlackAlone float64 % Race:AsianAlone float64 % Race:Other float64 % BelowPovertyLevel float64 % Immigration float64 % Umemployment float64 % ToWork:Car float64 % ToWork:Transit float64 % ToWork:Bike float64 % ToWork:Walk float64 % ToWork:WFH float64 RatioMedianGrossRentToIncome float64 RatioMedianHomeValueToIncome float64 geometry geometry land_use object dtype: object
| origin_flow | dest_flow | Density | % Race:WhiteAlone | % Race:BlackAlone | % Race:AsianAlone | % Race:Other | % BelowPovertyLevel | % Immigration | % Umemployment | % ToWork:Car | % ToWork:Transit | % ToWork:Bike | % ToWork:Walk | % ToWork:WFH | RatioMedianGrossRentToIncome | RatioMedianHomeValueToIncome | land_use | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 272.95695 | 272.95695 | 1730.720957 | 52.318286 | 14.521806 | 14.322877 | 18.837031 | 9.854629 | 18.944147 | 2.876817 | 52.639633 | 5.554705 | 0.000000 | 0.336649 | 1.438409 | 0.022329 | 5.293762 | Urban Character Residential |
| 1 | NaN | NaN | 1239.464041 | 62.771662 | 8.947220 | 18.204911 | 10.076207 | 3.810330 | 4.826418 | 2.681343 | 48.461756 | 3.415185 | 0.000000 | 0.310471 | 3.358736 | 0.017208 | 3.646323 | Natural Preservation and Conservation |
| 2 | NaN | NaN | 1492.774647 | 59.607732 | 6.310404 | 12.734508 | 21.347356 | 7.561114 | 9.238204 | 1.705514 | 44.030699 | 2.046617 | 0.000000 | 0.255827 | 2.956225 | 0.020622 | 3.848660 | Urban Character Residential |
| 3 | 0.00000 | 0.00000 | 2078.075797 | 37.876106 | 22.973451 | 26.070796 | 13.079646 | 3.185841 | 11.469027 | 2.371681 | 45.451327 | 5.380531 | 0.000000 | 0.230088 | 3.238938 | 0.018121 | 4.529858 | Undesignated |
| 4 | NaN | NaN | 1968.115919 | 21.297284 | 25.056326 | 37.578560 | 16.067829 | 14.704139 | 18.024428 | 1.138385 | 41.088581 | 9.107079 | 0.166014 | 0.154156 | 1.921025 | 0.025035 | 6.767804 | Intensive Urban |
367
land_use Urban Character Residential 218 Intensive Urban 106 Industrial 21 Rural Character Residential 9 Active Open Space and Recreation 7 Natural Preservation and Conservation 3 Public 3 Name: count, dtype: int64
land_use Urban Character Residential 218 Intensive Urban 106 Other 22 Industrial 21 Name: count, dtype: int64
3.1.1 Urban Character Residential AreaΒΆ
In Urban Character Residential, OD flows are slightly higher correlated with the percentage of the population using transit and walking to work. This correlation suggests that more people choose transit and walking modes of transportation for their daily commute, leading to higher OD flows associated with them.
3.1.2 Intensive Urban AreaΒΆ
In Intensive Urban, OD flows are positively correlated with To Work by Walk, Ratio Rent to Income, and Ratio House Value to Income,and negatively correlated with Going To Work by Car. This indicates a preference for walking as a mode of transportation in these areas in contrast to driving. Also, if the cost of renting and house value relative to income is higher, there are increased OD flows.
3.1.3 Industrial AreaΒΆ
In Industrial Areas, All Transportation modes to work have strong positive correlations with OD flows, except car that is strong negative. Unemployment also shows a noticeable negative correlation with OD flows. If percentages of alternative modes are higher, the OD flows are higher. This reflects that population a reliance on alternative modes of transportation to work. However, if unemployment is lower, the OD flows are higher.
3.1.4 Other AreasΒΆ
In Other Areas, the Percentage of Black race, the Percentage Below Poverty, and Rent/Income have strong positive correlations with OD flows. Driving to work* also shows a strong negative correlation with OD flows. This indicates that if areas are populated by Black people, below poverty, or transit users, their OD flows are higher as well.
3.2 OLSΒΆ
Linear regression is a statistical technique used to describe relationships among variables. It can predict the relationship between variables by assuming a linear connection between the one or several independent variables (x) and dependent variable (y). The formula is given as:
Y = $B_0$ + $B_1$$X_1$ + $B_1$$X_2$ + ... + $B_p$$X_p$ + $\epsilon$
Where
- Y = the dependent or predicted variable
- $B_0$ = the y-intercept
- $B_1$ and $B_2$ = regression coefficients representing the change in y relative to a one-unit change in $X_1$ and $X_2$, respectively
- $B_p$ = the slope coefficient for each independent variable
- $\epsilon$ = the modelβs random error (residual) term
This study uses regression analysis to answer the following questions
- Does race relate with trip number?
- What type of land use has impact to the number of trip origin and trip destination?
- Which factor effect number of the trip more: unemployment or poverty level?
- What type of land use has higher number of transit to work use? what about WFH?
3.2.0 Clean dataΒΆ
Remove rows with NaN value and land use data that is 'N/A' type or 'Undesignated' type
Inspecting land use type data distribution
3.2.1 Race VS Trip NumberΒΆ
The dependent variable is the sum of original flow and destination flow. The independent variables are percentage of each race in each census
/Users/kaitlynng/opt/anaconda3/lib/python3.9/site-packages/seaborn/axisgrid.py:2076: UserWarning: The `size` parameter has been renamed to `height`; please update your code. warnings.warn(msg, UserWarning)
| Dep. Variable: | y | R-squared: | 0.023 |
|---|---|---|---|
| Model: | OLS | Adj. R-squared: | 0.008 |
| Method: | Least Squares | F-statistic: | 1.539 |
| Date: | Mon, 04 Mar 2024 | Prob (F-statistic): | 0.206 |
| Time: | 12:06:31 | Log-Likelihood: | -2118.7 |
| No. Observations: | 198 | AIC: | 4245. |
| Df Residuals: | 194 | BIC: | 4258. |
| Df Model: | 3 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| % Race:WhiteAlone | 49.7496 | 19.213 | 2.589 | 0.010 | 11.856 | 87.644 |
| % Race:BlackAlone | 141.2025 | 99.646 | 1.417 | 0.158 | -55.326 | 337.731 |
| % Race:AsianAlone | 147.1504 | 56.369 | 2.610 | 0.010 | 35.976 | 258.325 |
| % Race:Other | -80.6080 | 101.507 | -0.794 | 0.428 | -280.807 | 119.591 |
| Omnibus: | 198.933 | Durbin-Watson: | 2.009 |
|---|---|---|---|
| Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 4251.077 |
| Skew: | 3.975 | Prob(JB): | 0.00 |
| Kurtosis: | 24.262 | Cond. No. | 11.1 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
A positive coefficient indicates that as the predictor variable increases, the Target variable also increases. From the result, as percentage of one race in census increase, the number of total trip also increase. Census with most majority is Asian trend to produce trip the most
Plots below show relationship of race dominant and number of trip origin
3.2.2 Trip Number vs Land Use TypeΒΆ
This analysis use original flow and destination flow as dependent value. The independent variables are land use type (categorical value)
| land_use_Industrial | land_use_Intensive Urban | land_use_Rural Character Residential | land_use_Urban Character Residential | |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 1 |
| 9 | 0 | 1 | 0 | 0 |
| 12 | 0 | 0 | 0 | 0 |
| 13 | 1 | 0 | 0 | 0 |
| 14 | 0 | 0 | 0 | 1 |
| Dep. Variable: | y | R-squared (uncentered): | 0.236 |
|---|---|---|---|
| Model: | OLS | Adj. R-squared (uncentered): | 0.220 |
| Method: | Least Squares | F-statistic: | 14.94 |
| Date: | Mon, 04 Mar 2024 | Prob (F-statistic): | 1.16e-10 |
| Time: | 12:06:53 | Log-Likelihood: | -2119.2 |
| No. Observations: | 198 | AIC: | 4246. |
| Df Residuals: | 194 | BIC: | 4260. |
| Df Model: | 4 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| land_use_Industrial | 5250.1097 | 2718.834 | 1.931 | 0.055 | -112.158 | 1.06e+04 |
| land_use_Intensive Urban | 7460.4255 | 1208.371 | 6.174 | 0.000 | 5077.196 | 9843.655 |
| land_use_Rural Character Residential | 9779.1678 | 7690.023 | 1.272 | 0.205 | -5387.615 | 2.49e+04 |
| land_use_Urban Character Residential | 4481.7169 | 1109.959 | 4.038 | 0.000 | 2292.580 | 6670.853 |
| Omnibus: | 191.464 | Durbin-Watson: | 2.040 |
|---|---|---|---|
| Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 3517.431 |
| Skew: | 3.816 | Prob(JB): | 0.00 |
| Kurtosis: | 22.186 | Cond. No. | 6.93 |
Notes:
[1] RΒ² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Box plot shows that 'Intensive urban resident' and 'Urban Characteristic Resident' has wide range of the number of trip, however, most of data are outliner which indicated that there are some special area where there are trip generated more than usual. From regression analysis, it can indicate that rural residential area are likely to generate amouth of trips. This make sense as facilities might be limited in rural area so people need to make trip to do their activities
3.2.3 Which factor effect number of the trip more: unemployment or poverty level?ΒΆ
This analysis use original flow and destination flow as dependent value. The independent variables are percentage of unemployment population and percentage of population with BelowPovertyLevel
Text(0.5, 1.0, '% Unemployment vs % BelowPovertyLevel')
| Dep. Variable: | y | R-squared (uncentered): | 0.234 |
|---|---|---|---|
| Model: | OLS | Adj. R-squared (uncentered): | 0.226 |
| Method: | Least Squares | F-statistic: | 29.89 |
| Date: | Mon, 04 Mar 2024 | Prob (F-statistic): | 4.69e-12 |
| Time: | 12:07:05 | Log-Likelihood: | -2119.4 |
| No. Observations: | 198 | AIC: | 4243. |
| Df Residuals: | 196 | BIC: | 4249. |
| Df Model: | 2 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| % Umemployment | 894.5723 | 461.404 | 1.939 | 0.054 | -15.381 | 1804.526 |
| % BelowPovertyLevel | 282.8231 | 108.976 | 2.595 | 0.010 | 67.907 | 497.739 |
| Omnibus: | 190.443 | Durbin-Watson: | 2.015 |
|---|---|---|---|
| Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 3486.356 |
| Skew: | 3.785 | Prob(JB): | 0.00 |
| Kurtosis: | 22.112 | Cond. No. | 7.84 |
Notes:
[1] RΒ² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
From scatter plot, percentage of unemployment and population with below poverty level are related. However, these two factor has different effect to the number of trip. From linear regression analysis, area with higher unemployment makes more trip than area with people below poverty level. This can be concluded that people below poverty cannot affroad travel expense than unemployememt people
3.2.4 What type of land use has higher number of transit to work use? what about WFH?ΒΆ
This analysis use percentage of people who use transit to work or percentage of people who work from home (WFH) as dependent value. The independent variables are land use type (categorial)
| Dep. Variable: | % ToWork:Transit | R-squared (uncentered): | 0.792 |
|---|---|---|---|
| Model: | OLS | Adj. R-squared (uncentered): | 0.788 |
| Method: | Least Squares | F-statistic: | 184.6 |
| Date: | Mon, 04 Mar 2024 | Prob (F-statistic): | 5.77e-65 |
| Time: | 12:07:23 | Log-Likelihood: | -611.87 |
| No. Observations: | 198 | AIC: | 1232. |
| Df Residuals: | 194 | BIC: | 1245. |
| Df Model: | 4 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| land_use_Industrial | 12.0806 | 1.343 | 8.992 | 0.000 | 9.431 | 14.730 |
| land_use_Intensive Urban | 12.2899 | 0.597 | 20.583 | 0.000 | 11.112 | 13.468 |
| land_use_Rural Character Residential | 7.0098 | 3.800 | 1.845 | 0.067 | -0.484 | 14.504 |
| land_use_Urban Character Residential | 8.3243 | 0.548 | 15.178 | 0.000 | 7.243 | 9.406 |
| Omnibus: | 4.138 | Durbin-Watson: | 1.520 |
|---|---|---|---|
| Prob(Omnibus): | 0.126 | Jarque-Bera (JB): | 4.141 |
| Skew: | 0.320 | Prob(JB): | 0.126 |
| Kurtosis: | 2.694 | Cond. No. | 6.93 |
Notes:
[1] RΒ² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
| Dep. Variable: | % ToWork:WFH | R-squared (uncentered): | 0.799 |
|---|---|---|---|
| Model: | OLS | Adj. R-squared (uncentered): | 0.795 |
| Method: | Least Squares | F-statistic: | 192.8 |
| Date: | Mon, 04 Mar 2024 | Prob (F-statistic): | 2.04e-66 |
| Time: | 12:07:27 | Log-Likelihood: | -411.41 |
| No. Observations: | 198 | AIC: | 830.8 |
| Df Residuals: | 194 | BIC: | 844.0 |
| Df Model: | 4 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| land_use_Industrial | 2.8205 | 0.488 | 5.778 | 0.000 | 1.858 | 3.783 |
| land_use_Intensive Urban | 4.0694 | 0.217 | 18.758 | 0.000 | 3.642 | 4.497 |
| land_use_Rural Character Residential | 2.2782 | 1.381 | 1.650 | 0.101 | -0.445 | 5.001 |
| land_use_Urban Character Residential | 3.9001 | 0.199 | 19.572 | 0.000 | 3.507 | 4.293 |
| Omnibus: | 7.055 | Durbin-Watson: | 1.795 |
|---|---|---|---|
| Prob(Omnibus): | 0.029 | Jarque-Bera (JB): | 6.978 |
| Skew: | 0.457 | Prob(JB): | 0.0305 |
| Kurtosis: | 3.096 | Cond. No. | 6.93 |
Notes:
[1] RΒ² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Percentage of people who WFH is lower than people who transit to work in every land use type. People mostly use transit to work in Industrial and Intencive Urban area. As these area mostly have good transit facility and mainly serving for people who working. However, people who live in intenseive urban area tend to WFH the most. This indicated that intensive urban area must have both office and residential area.
3.3 Mapping AnalysisΒΆ
These graphs will provide insight on the relationship between OD flows, demographics, and land use.
What is the relationship between transit flow patterns and demographics?ΒΆ
Transit OD flows are plotted via scikit-mobility, which is a Python library for human mobility analysis. Their visualizations are built on top of folium.